Now that we know how to pull in data, check and transform Coordinate Reference Systems (CRS), and plot sf data.frames together - let’s practice doing the same thing with other geometry types. In this notebook we’ll be bringing in bike boulevards and schools, which will get us primed to think about spatial relationship queries.
Instructor Notes
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(tmap)
We’re going to bring in data bike boulevards in Berkeley. Note two things that are different from our previous data:
We’re bringing in a GeoJSON this time and not a shapefile
We have a line geometry GeoDataFrame (our county and states data had polygon geometries)
bike_blvds = st_read('notebook_data/transportation/BerkeleyBikeBlvds.geojson')
## Reading layer `BerkeleyBikeBlvds' from data source `/Users/hikarimurayama/Documents/repos/Geospatial-Fundamentals-in-R-with-sf/notebook_data/transportation/BerkeleyBikeBlvds.geojson' using driver `GeoJSON'
## Simple feature collection with 211 features and 10 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 561541.2 ymin: 4189007 xmax: 566451.6 ymax: 4193483
## projected CRS: WGS 84 / UTM zone 10N
plot(bike_blvds$geometry)
Of course, we could also use tmap to plot our lines:
# set to view mode
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(bike_blvds) +
tm_lines()
As usual, we’ll want to do our usual data exploration…
head(bike_blvds)
## Simple feature collection with 6 features and 10 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 562293.8 ymin: 4189795 xmax: 562786.5 ymax: 4189899
## projected CRS: WGS 84 / UTM zone 10N
## BB_STRNAM BB_STRID BB_FRO BB_TO BB_SECID DIR_ Status ALT_bikeCA
## 1 Heinz/Russell RUS 7th 8th RUS01 E/W Existing No
## 2 Heinz/Russell RUS 8th 9th RUS02 E/W Ezisting No
## 3 Heinz/Russell RUS 9th 10th RUS03 E/W Existing No
## 4 Heinz/Russell RUS 10th San Pablo RUS04 E/W Existing No
## 5 San Pablo RUS Heinz Russell RUS05 N/S Existing No
## 6 Russell RUS San Pablo Wallace RUS06 E/W Exisitng 3
## Shape_len len_km geometry
## 1 101.12817 0.101 MULTILINESTRING ((562293.8 ...
## 2 100.81407 0.101 MULTILINESTRING ((562391.6 ...
## 3 100.03740 0.100 MULTILINESTRING ((562489 41...
## 4 106.59288 0.107 MULTILINESTRING ((562585.7 ...
## 5 89.56348 0.090 MULTILINESTRING ((562688.9 ...
## 6 76.95699 0.077 MULTILINESTRING ((562711.4 ...
dim(bike_blvds)
## [1] 211 11
colnames(bike_blvds)
## [1] "BB_STRNAM" "BB_STRID" "BB_FRO" "BB_TO" "BB_SECID"
## [6] "DIR_" "Status" "ALT_bikeCA" "Shape_len" "len_km"
## [11] "geometry"
Our bike boulevard data includes the following information:
BB_STRNAM - bike boulevard StreetnameBB_STRID - bike boulevard Street IDBB_FRO - bike boulevard origin streetBB_TO - bike boulevard end streetBB_SECID- bike boulevard section idDIR_ - cardinal directions the bike boulevard runsStatus - status on whether the bike boulevard existsALT_bikeCA - ?Shape_len - length of the boulevard in meterslen_km - length of the boulevard in kilometersgeometry Question
Why are there 211 features when we only have 8 bike boulevards?
And now take a look at our CRS…
st_crs(bike_blvds)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 10N
## wkt:
## PROJCRS["WGS 84 / UTM zone 10N",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 10N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-123,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["World - N hemisphere - 126°W to 120°W - by country"],
## BBOX[0,-126,84,-120]],
## ID["EPSG",32610]]
Let’s tranform our CRS to UTM Zone 10N, NAD83 that we used in the last lesson.
bike_blvds_utm10 = st_transform(bike_blvds, crs = 26910)
head(bike_blvds_utm10)
## Simple feature collection with 6 features and 10 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 562293.8 ymin: 4189795 xmax: 562786.5 ymax: 4189899
## projected CRS: NAD83 / UTM zone 10N
## BB_STRNAM BB_STRID BB_FRO BB_TO BB_SECID DIR_ Status ALT_bikeCA
## 1 Heinz/Russell RUS 7th 8th RUS01 E/W Existing No
## 2 Heinz/Russell RUS 8th 9th RUS02 E/W Ezisting No
## 3 Heinz/Russell RUS 9th 10th RUS03 E/W Existing No
## 4 Heinz/Russell RUS 10th San Pablo RUS04 E/W Existing No
## 5 San Pablo RUS Heinz Russell RUS05 N/S Existing No
## 6 Russell RUS San Pablo Wallace RUS06 E/W Exisitng 3
## Shape_len len_km geometry
## 1 101.12817 0.101 MULTILINESTRING ((562293.8 ...
## 2 100.81407 0.101 MULTILINESTRING ((562391.6 ...
## 3 100.03740 0.100 MULTILINESTRING ((562489 41...
## 4 106.59288 0.107 MULTILINESTRING ((562585.7 ...
## 5 89.56348 0.090 MULTILINESTRING ((562688.9 ...
## 6 76.95699 0.077 MULTILINESTRING ((562711.4 ...
Alright! Now that we have our bike boulevard data squared away, we’re going to bring in our Alameda County school data.
schools_df = read.csv('notebook_data/alco_schools.csv')
head(schools_df)
## X Y Site Address City
## 1 -122.2388 37.74476 Amelia Earhart Elementary 400 Packet Landing Rd Alameda
## 2 -122.2519 37.73900 Bay Farm Elementary 200 Aughinbaugh Way Alameda
## 3 -122.2589 37.76206 Donald D. Lum Elementary 1801 Sandcreek Way Alameda
## 4 -122.2348 37.76525 Edison Elementary 2700 Buena Vista Ave Alameda
## 5 -122.2381 37.75396 Frank Otis Elementary 3010 Fillmore St Alameda
## 6 -122.2616 37.76911 Franklin Elementary 1433 San Antonio Ave Alameda
## State Type API Org
## 1 CA ES 933 Public
## 2 CA ES 932 Public
## 3 CA ES 853 Public
## 4 CA ES 927 Public
## 5 CA ES 894 Public
## 6 CA ES 893 Public
dim(schools_df)
## [1] 550 9
Questions Without looking ahead:
This is not an sf data.frame! A couple of clues to figure that out are..
Although our school data is not starting off as an sf data.frame, we actually have the tools and information to make it one. Using the st_as_sf function, we can coerce our plain data.frame into an sf data.frame (specifying the columns containings the points’ coordinates and the EPSG code of the CRS).
schools_sf = st_as_sf(schools_df,
coords = c('X', 'Y'),
crs = 4326)
head(schools_sf)
## Simple feature collection with 6 features and 7 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -122.2616 ymin: 37.739 xmax: -122.2348 ymax: 37.76911
## geographic CRS: WGS 84
## Site Address City State Type API Org
## 1 Amelia Earhart Elementary 400 Packet Landing Rd Alameda CA ES 933 Public
## 2 Bay Farm Elementary 200 Aughinbaugh Way Alameda CA ES 932 Public
## 3 Donald D. Lum Elementary 1801 Sandcreek Way Alameda CA ES 853 Public
## 4 Edison Elementary 2700 Buena Vista Ave Alameda CA ES 927 Public
## 5 Frank Otis Elementary 3010 Fillmore St Alameda CA ES 894 Public
## 6 Franklin Elementary 1433 San Antonio Ave Alameda CA ES 893 Public
## geometry
## 1 POINT (-122.2388 37.74476)
## 2 POINT (-122.2519 37.739)
## 3 POINT (-122.2589 37.76206)
## 4 POINT (-122.2348 37.76525)
## 5 POINT (-122.2381 37.75396)
## 6 POINT (-122.2616 37.76911)
dim(schools_sf)
## [1] 550 8
You’ll notice that the shape is almost the same as what we had as a data.frame, except with one less column (because the two coordinate columns, X, and Y, were consumed into a single geometry column.
Now that it’s an sf data.frame, we can use the fancy plot method for it just as we did for our other data sets. Notice that this is our first point dataset.
plot(schools_sf)
But of course we’ll want to transform the CRS, so that we can later plot it with our bike boulevard data.
schools_utm10 = st_transform(schools_sf, crs = 26910)
And keep in mind that we can always use tmap to plot any of our datasets.
Here’s how we’d use tmap for point data:
tm_shape(schools_utm10) +
tm_dots(col='green', size=0.2)
In Lesson 2 we discussed that you can save out sf data.frames in multiple file formats. You could opt for a GeoJSON, a shapefile, etc… However, for point data sets it is also an option to save it out as a CSV since each geometry only has a single X and single Y value.
Let’s play around with another points dataset.
In the code cell provided below, compose code to:
notebook_data/parcels/parcel_pts_rand30pct.geojson)tmap to plot and customize as desired!To see the solution, look at the hidden text below.
# YOUR CODE HERE:
No matter the geometry type we have for our sf data.frame, we can create overlay plots.
Since we’ve already done the legwork of transforming our CRS, we can go ahead and plot them together.
tm_shape(schools_utm10) +
tm_dots(size=0.1) +
tm_shape(bike_blvds_utm10) +
tm_lines(col='red')
If we want to answer questions like “What schools are close to bike boulevards in Berkeley?”, the above plot isn’t super helpful, since the extent covers all of Alameda county.
Luckily, it is easy for us to crop a sf data.frame, so that we only retain the rows whose geometries are within the bounding box (or extent) of another dataset.
We do this with the st_crop function.
schools_utm10_crop = st_crop(schools_utm10, bike_blvds_utm10)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
Now what’s see what that last plot looks like using our cropped data.
tm_shape(schools_utm10_crop) +
tm_dots(size=0.1) +
tm_shape(bike_blvds_utm10) +
tm_lines(col='red')
In this lesson we learned a several new skills:
sf data.framest_cropLet’s take some time to practice reading in and reconciling new datasets, then mapping them together.
In the code cell provided below, write code to:
notebook_data/berkeley/BerkeleyCityLimits.shp)BONUS: Add the Berkeley outline to your last plot!
To see the solution, look at the hidden text below.
# YOUR CODE HERE: